Perceptual cost functions for unit searching in large corpus-based text-to-speech
نویسنده
چکیده
In large corpus-based concatenative Text-to-Speech, unit selection is critical for the quality of synthetic speech. Dynamic programming algorithms have been used for unit-searching by minimizing a total cost (1) between target specification and candidate units and (2) between candidate units for concatenation. The cost function is often a weighted sum of sub-costs, which are the costs for each of the acoustic and phonetic features of units. The weights control the individual contribution of the sub-costs to the total cost. They also determine the relative sensitivity of a feature to the quality degradation when signal processing is applied to modify the feature. However, determining the weights for the cost function has not been a simple task. In this paper, we propose a new method for unit-searching based on a perceptual preference test. The proposed algorithm is designed to find the weights in more systematic and meaningful way. The algorithm searches for a set of weights that can produce a ranking of renditions, that is close to the perceptual test results. The downhill simplex method is used for the multidimensional search of the weights. A dissimilarity measure is proposed to evaluate the closeness of two rankings. About 83 percent of the cases, the unit selection algorithm using the optimal set of weights choose the same rendition that human listeners prefer. The results show that the proposed weight optimization algorithm can successfully predict the human preference pattern. The synthetic speech using the optimal weights consistantly showed smoother transition and higher voice quality than the one using manually determined weights.
منابع مشابه
A text-to-speech platform for variable length optimal unit searching using perceptual cost functions
In concatenative Text-to-Speech, the size of the speech corpus is closely related to synthetic speech quality. In this paper, we describe our work on a new corpus-based Bell Labs' TTS system. This encompasses large acoustic inventories with a rich set of annotations, models and data structures for representing and managing such inventories, and an optimal unit selection algorithm that accommoda...
متن کاملStudy on Unit-Selection and Statistical Parametric Speech Synthesis Techniques
One of the interesting topics on multimedia domain is concerned with empowering computer in order to speech production. Speech synthesis is granting human abilities to the computer for speech production. Data-based approach and process-based approach are the two main approaches on speech synthesis. Each approach has its varied challenges. Unit-selection speech synthesis and statistical parametr...
متن کاملUnit Selection Algorithm Using Bi-grams Model For Corpus-Based Speech Synthesis
In this paper, we present a novel statistical approach to corpus-based speech synthesis. Classically, phonetic information is defined and considered as acoustic reference to be respected. In this way, many studies were elaborated for acoustical unit classification. This type of classification allows separating units according to their symbolic characteristics. Indeed, target cost and concatenat...
متن کاملL2 Learners’ Lexical Inferencing: Perceptual Learning Style Preferences, Strategy Use, Density of Text, and Parts of Speech as Possible Predictors
This study was intended first to categorize the L2 learners in terms of their learning style preferences and second to investigate if their learning preferences are related to lexical inferencing. Moreover, strategies used for lexical inferencing and text related issues of text density and parts of speech were studied to determine their moderating effects and the best predictors of lexical infe...
متن کاملHierarchical non-uniform unit selection based on prosodic structure
In speech synthesis systems based on wave concatenation, using longer units can generate more natural synthetic speech. In order to improve the usage of longer units in the corpus, this paper proposed a hierarchical non-uniform unit selection framework. Each layer included in the framework is an independent searching procedure which searches for different sized units and adopts suitable natural...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2001